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Abstract 

Today’s public Internet availability and capabilities 
allow manifold applications in the field of multime¬ 
dia that were not possible a few years ago. One 
emerging application is the so-called Networked Mu¬ 
sic Performance, standing for the online, low-latency 
interaction of musicians. This work proposes a 
stand-alone device for that specific purpose and is 
based on a Raspberry Pi running a Linux-based op¬ 
erating system. 
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1 Introduction 

The ways of today’s online communication are 
versatile and rapidly evolving. The trend went 
from text-based communication, over audio- 
based communication, and finally constituted in 
multimedia-based communication. One arising 
branch of online communication is the so-called 
Networked Music Performance (NMP), a spe¬ 
cial application of Audio over IP (AoIP). It al¬ 
lows musicians to interact with each other in 
a virtual acoustic space by connecting their in¬ 
struments to their computers and a software- 
based link-up. This procedure allows artistic 
collaborations over long distances without the 
need of traveling and hence, can enrich the life 
of artists. Instead of increasing the content di¬ 
mensionality and therefore the data rate, the 
challenge in AoIP is to fulfill a certain delay 
threshold that still allows musical interaction. 

For the purpose of providing an easy-to-use 
system realization, an all-in-one device, entitled 
the JamBerry , is presented in this work. The 
proposed system, as shown in Fig. 1, consists 
of the well-known Raspberry Pi [1] and several 
custom hardware extensions. These are neces¬ 
sary since the Raspberry Pi does not provide 
high-quality audio output and no audio input at 
all. Furthermore, the proposed device includes 
several hardware components allowing a quick 


and simple connection of typical audio hard¬ 
ware and instruments. The Raspberry Pi itself 
can be described as chip-card-sized single-board 
computer. It was initiated for educational pur¬ 
poses and is now widely-used, especially in the 
hardware hobbyist community since it provides 
various interfaces for all sorts of extensions. 



Figure 1: The JamBerry Device 


The paper is structured as following. An in¬ 
troduction into the topic of Audio over IP is 
given in Section 2, including the requirements 
and major challenges when building such a sys¬ 
tem. Section 3 gives a detailed view on the ac¬ 
tual AoIP software running on the JamBerry. 
The necessary extensions of the Linux audio 
drivers and the integration in the ALSA frame¬ 
work is depicted in Section 4. The custom hard¬ 
ware extensions to the Raspberry Pi are ex¬ 
plained in Section 5. Section 6 highlights the ca¬ 
pabilities of the JamBerry in the contexts of au¬ 
dio and network parameters, whereas conclud¬ 
ing thoughts can be found in Section 7. 

2 Audio over IP 

Transmission of Audio over IP-based networks 
is nowadays a wide-spread technology with two 
main applications: Broadcasting audio streams 
and telephony applications. While the first one 
provides no return channel, the second one al¬ 
lows for direct interaction over large distances. 
Although, the requirements in terms of audio 



quality and latency for playing live music to¬ 
gether are not fulfilled by current telephony sys¬ 
tems. 

The massive spreading of broad-band Inter¬ 
net connections and increase in network reliabil¬ 
ity allows the realization of AoIP systems now. 
Therefore, this topic gained much research at¬ 
tention in the last years. A good introduction 
into the topic of Networked Music Performances 
and the associated problems can be found in [2], 
while [3] gives an extensive overview of existing 
systems. 

An early approach was SoundWIRE [4] by 
the Center for Computer Research in Music and 
Acoustics (CCRMA), where later JackTrip [5] 
was developed. JackTrip includes several meth¬ 
ods for counteracting packet loss such as over¬ 
lapping of packets and looping of data in case 
of a lost packet. It is based on the JACK sound 
system, just like NetJack [6] that is now part of 
JACK itself. To avoid the restriction to JACK, 
Soundjack [7] is based on a generic audio core 
and hence, allows cross-platform musical online 
interaction. 

The Distributed Musical Rehersal Environ¬ 
ment [8] focuses on preparing groups of musi¬ 
cians for a final performance without the need 
to be at the same place. Remote rehersal is 
also one of the applications of the DIAMOUSES 
framework [9] that has a very versatile platform 
including a portal for arranging jam sessions, 
MIDI support and DVB support for audience 
involvement. 

2.1 Requirements 

The goal of this project was to build a com¬ 
plete distributed music performance system to 
show the current state of research and estab¬ 
lish a platform for further research. The sys¬ 
tem is supposed to be usable in realistic envi¬ 
ronments such as rehearsal rooms. Therefore, 
it should be a compact system that integrates 
all important features for easy to setup jam¬ 
ming sessions. This includes two input chan¬ 
nels with various input capabilities to support 
high-amplitude sound sources such as keyboards 
or preamplified instruments, as well as low- 
amplitude sound sources like microphones and 
passive guitar pickups. Furthermore, it should 
drive headphones and provide line-level output 
signals. 

The system should support sampling rates 
of 48 kHz with a bit depth of 16 bit. Higher 
values do not provide much benefit in quality. 


Furthermore, no further signal processing steps, 
depending on highly-detailed signaled represen¬ 
tations, are involved. To allow the interaction 
with several musicians but still stick to the com¬ 
putational constraints of the Raspberry Pi, the 
system shall support up to four interconnected 
JamBerries. 

2.2 Challenges 

Transmission of audio via the Internet is 
considerably different from point-to-point dig¬ 
ital audio transmission techniques such as 
AES/EBU [10] and even Audio over Ether¬ 
net (AoE) techniques like Dante or EtherSound 
[11]. The transmission of data packets via the 
Internet is neither reliable nor properly pre¬ 
dictable. This leads to audio data being consid¬ 
erably delayed or even vanished in the network. 

This is commonly counteracted by using large 
data buffers where the packets arriving in ir¬ 
regular time intervals are additionally delayed 
so that late packets can catch up. Unfortu¬ 
nately, large buffers are contradictory to the re¬ 
quirements of distributed music performances 
since a minimum latency is essential. Inter¬ 
action of several musicians is solely achievable 
when the round trip delay does not exceed a 
certain threshold [12; 13]. Secondly, even large 
buffers do not prevent dropouts resulting from 
lost packets. Therefore, this project takes two 
completive approaches: 

• Audio data packets that do not arrive in 
time are substituted by a technique called 
error concealment. Instead of playing back 
silence, audio is calculated from preceding 
data. 

• The data buffer length is dynamically ad¬ 
justed to the network conditions. This en¬ 
ables minimum latency while still providing 
good audio quality. 

3 Software System 

The AoIP software itself is a multi-threaded 
C+-1-11 application running in user space. It 
accesses the audio hardware via the well-known 
ALSA [14] library. The user interaction takes 
place via a WebSocket interface that enables the 
use of a JavaScript/HTML GUI that can be ac¬ 
cessed via the integrated touchscreen as well as 
from a remote PC or tablet. The WebSocket 
interface is provided by a library [15] written 
during this project running the WAMP [16] pro¬ 
tocol. 
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Figure 2: Data flow of the JamBerry software 


latency coding procedure. The encoding is done 
by the EncoderStream that passes the data to 
the sender for sending it to all connected peers 
via unicast UDP. Currently, there is no discov¬ 
ery protocol implemented, so the peers have to 
be entered manually. As soon as the data is re¬ 
ceived at the receiver, it is decoded and pushed 
into the receiver buffer queue. The Playback- 
Controller mixes the data from various sources 
and enables ALSA to access the result. Thus, 
a continuous reading of data is realized. In the 
case of missing data an error concealment pro¬ 
cedure is triggered to replace the missing data 
and avoid gaps in the playback. The current 
implementation utilizes the concealment proce¬ 
dure from the Opus codec, since its complexity 
is low in contrast to other known concealment 
strategies [19; 20; 21]. Alternatively, the last 
block can be repeated until newer data arrives 
(so-called ’’wavetable mode” as in [5]). The 
queuing process at the receiver is explained in 
more detail in the following. 

3.1 Adaptive Queuing 

In order to achieve minimum latency while 
maintaining good audio quality, the length of 
the audio buffer is adjusted to the network con¬ 
ditions within the playback thread. The corre¬ 
sponding control routine is depicted in Fig. 3. 



The data flow of the audio through the soft¬ 
ware is depicted in Fig. 2. Audio is captured 
from the hardware via the ALSA library. As 
soon as a block (120 or 240 samples) of new data 
is available, it is taken by the CaptureController 
that mixes the signal down to a single channel. 
Transmitting multiple streams is possible, too, 
but provides a negligible benefit in this scenario. 
The data can be transmitted as raw data. Al¬ 
ternatively, the required data rate can be re¬ 
duced by utilization of the Opus [17; 18] low- 


Figure 3: Process of Playback Thread 

The ALSA data queue is kept very short to 
avoid unnecessary delays that would increase 
the overall latency. The PlaybackController 
monitors the state of ALSA and just before the 
hardware will request new data, it is written to 
the ALSA buffer. Whenever current audio data 
exists in the moment of the hardware request, 
this data is utilized. In the case of missing data, 
the error concealment routine is triggered to 
produce the corresponding data. The computa- 

































































































tion of concealment data takes some time. This 
period of time is taken into account to provide 
the data just at the right point in time. 

In order to maintain a reasonable buffer size, 
a simple open-loop control was implemented. A 
buffer size that is unreasonably large would re¬ 
sult in useless delay. When the buffer is too 
small, a major part of the audio packets arrives 
too late. Although a certain amount of packets 
can be concealed, the audio quality decreases 
with a rising amount of lost packets. 

Right after a new connection was established, 
the buffer size is set to a very high value. In the 
following few seconds, the length of the queue in 
samples Q is measured and the standard devi¬ 
ation (j q is calculated. After the measurement 
phase is over, the optimal queue length is cal¬ 
culated as 

Qopt = ft • O'Q , ( 1 ) 

where the constant (3 > 1 accounts for pack¬ 
ets outside the range of the standard deviation. 
When the current queue length is outside the 
interval 


is driven by a pulse-width modulation (PWM) 
interface providing medium quality audio. For¬ 
tunately, there is another possibility for audio 
transmission: The Broadcom SoC on the Rasp¬ 
berry Pi provides a digital I 2 S interface [22] that 
can be accessed by pin headers. Together with 
an external audio codec as explained in the next 
section, this enables high quality audio input 
and output. However, the Linux kernel lacked 
support for the I 2 S peripheral of the Rasp¬ 
berry Pi. An integral part of this project was 
therefore to write an appropriate kernel driver. 

Since this driver should be as generic as pos¬ 
sible, it is implemented as a part of the ALSA 
System on Chip (ASoC) framework. It is a sub¬ 
system of ALSA tailored to the needs of embed¬ 
ded systems that provides some helpful abstrac¬ 
tions that makes it easy to adapt the driver for 
use with other external hardware. Actually, to¬ 
day there is quite a large number of both open 
and commercial hardware that uses the driver 
developed during this project. 


[Qopt Qtoh Qopt T Qtoili (2) 

the corresponding number of samples is dropped 
or generated. Once the queue is adjusted to the 
current network characteristic, this occurs very 
infrequently so the audible effect is insignificant. 
The parameters j3 and Q to i are used to trade-off 
the amount of lost packets, and therefore the 
audio quality, against the latency. 

4 Linux Kernel Driver 

The Raspberry Pi has neither audio input nor 
proper audio output. The existing audio output 



Figure 4: Structure of ASoC and the embed¬ 
ment into the Linux audio framework 


Fig. 4 depicts the general structure of ASoC 
as used for this project. When an application 
starts the playback of audio, it calls the cor¬ 
responding function of the ALSA library. This 
again calls the appropriate initializers for the in¬ 
volved peripheral drivers that are listed in the 
machine driver. In particular this is the codec 
driver that is responsible for control commands 
via I 2 C, the I 2 S driver for controlling the digital 
audio interface, and the platform driver for com¬ 
manding the DMA engine driver. DMA (Direct 
Memory Access) is responsible for transmitting 
audio data from the main memory to the I 2 S 
peripheral and back. The I 2 S peripheral for¬ 
wards this data via the I 2 S interface to the audio 
codec. For starting the playback of the codec, 
the codec driver will send an appropriate com¬ 
mand by using the I 2 C subsystem. The codec 
driver is used for transmitting other codec set¬ 
tings such as volume, too. 

These encapsulations and generic interfaces 
are the reason for the software structure’s flex¬ 
ibility and reusability. For using a new audio 
codec with the Raspberry Pi, only the codec 
driver and the slim machine driver have to be 
replaced. In many cases only the wiring by the 
machine driver has to be adapted since there 
are already many codec drivers available. The 
spreading of these drivers is based on the fre¬ 
quent usage of ASoC on different platforms. 















5 Hardware 

Since the Raspberry Pi does not provide proper 
analog audio interfaces, major effort was spent 
designing audio hardware, matching the NMP 
requirements. Furthermore, a touchscreen for 
user-friendly interaction was connected that re¬ 
quires interfacing hardware. Due to these ex¬ 
tensions, the JamBerry can be used as a stand¬ 
alone device without the need of external pe¬ 
ripherals such as a monitor. 

An overview of the external hardware is de¬ 
picted in Fig. 5. The extension’s functionality 
is distributed module-wise over three printed 
circuit boards: A codec board that contains 
the audio codec for conversion between analog 
and digital domain. It is stack mounted on the 
Raspberry Pi. This board is connected to the 
amplifier board that contains several amplifiers 
and connectors. The third board controls the 
touchscreen and is connected to the Raspberry 
Pi via HDMI. In the following, the individual 
boards are explained in more detail. 
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Figure 5: Hardware Overview 

5.1 Codec Board 

The main component on the digital audio board 
is a CS4270 audio codec by Cirrus Logic. It has 
integrated AD and DA converters that provide 
sample rates of up to 192 kHz and a maximum 
of 24 bits per sample. It is connected to the I 2 S 
interface of the Raspberry Pi for transmission of 
digital audio and to the I 2 C interface for con¬ 
trol. A linear voltage regulator provides power 
for the analog part of the audio codec, while 
the digital part is directly driven by the voltage 
line of the Raspberry Pi. The audio codec is 
controlled by an external master clock genera¬ 
tor. This enables fine-grained synchronization 


of the sampling frequency on different devices 
and prevents clock drifts as shown in [23]. The 
MAX9485 clock generator provides this possi¬ 
bility by a voltage controlled oscillator that can 
be tuned by an external DAC. 

5.2 Amplifier Board 

The analog audio board is designed to provide 
the most established connection possibilities. 
On the input side two combined XLR/TRS con¬ 
nectors allow the connection of various sources 
such as microphones, guitars or keyboards. 
Since these sources provide different output lev¬ 
els that have to be amplified to line-level for 
feeding it into the audio codec, a two-stage non¬ 
inverting operational amplifier circuit is placed 
channel-wise in front of the AD conversion unit. 
It is based on OPA2134 amplifiers by Texas In¬ 
struments that have proven their usability in 
previous guitar amplifier projects. The circuit 
allows an amplification of up to 68 dB. 

On the output side a direct line-level output 
is provided as well as a MAX13331 headphone 
amplifier. It can deliver up to 135 mW into 
32 headphones. Furthermore, the analog au¬ 
dio board contains the main power supply for 
the JamBerry. 

5.3 Touchscreen Driving Board 

In order to provide enough display space for a 
pleasant usage experience, but still maintain a 
compact system size, a 7” screen size is used. 
A frequently used, thus reliable, and afford¬ 
able resistive touchscreen of that size is the 
AT070TN92. For using it together with the 
Raspberry Pi, a video signal converter is needed 
to translate from HDMI to the 24 bit parallel in¬ 
terface of the TFT screen. This is provided by a 
TFP401A by Texas Instruments. The touch po¬ 
sition on the screen can be determined by mea¬ 
suring the resistance over the touch panel. This 
measurement is subject to noise that induces 
jittering and results in imprecise mouse point¬ 
ers. The AD7879-1W touch controller is used 
to conduct this measurement since it provides 
integrated mean and median filters that reduce 
the jitter and is controlled via I 2 C. The same 
interface is provided by a DAC for controlling 
the backlight of the TFT. An additional cable 
connection for the I 2 C connection was avoided 
by reusing the DDC interface inside the HDMI 
cable as carrier for the touch and brightness in¬ 
formation. 




































6 Evaluation 

The system was evaluated in terms of overall 
latency introduced by the network as well as 
audio quality. 

6.1 Network 

In order to evaluate the behavior of the system 
under various and reproducible network condi¬ 
tions, a network simulator was implemented. 
Fig. 6 shows the use of a single JamBerry device 
connected to the simulator that bounces the re¬ 
ceived data back to the sender. 



Figure 6: Software Evaluation System 


For calibrating the simulator to real condi¬ 
tions a network connection of 13 hops to a 
server, located in a distance of 450 km, is used. 
Fig. 7 shows the distribution of the packet de¬ 
lay. The average delay is about 18 ms with a 
standard deviation of 4 ms. 

Packet Delay in ms 

60 i-i-i-i-1-1 



Time in s 


Count 



Packet Delay in ms 


Figure 7: Time series and histogram of the 
packet delay over the test route 


The overall latency is measured by generating 
short sine bursts and feeding them into the Jam- 
Berry. This signal is compared to the resulting 
output signal by means of an oscilloscope. In 
addition, GPIO pins of the Raspberry Pi are 
toggled when the sine burst is processed in dif¬ 
ferent software modules as presented in Sect. 3. 
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Figure 8: Journey of a Sine Burst 

A resulting oscillogram can be seen in Fig. 8. 
The overall latency is about 40 ms. The time 
between sending and reception is 15 ms. This 
matches the time for the actual transmission. 
Between decoding and mixing, the packet is de¬ 
layed in the buffer queue for about 16 ms. This 
buffering is needed to compensate the high jitter 
of the connection. 

For the following evaluation, the overall la¬ 
tency is measured by using the above method 
while recording the amount of lost packets. 
Fig. 9 demonstrates the influence of factor (3 
in Eq. (1) while having a constant jitter vari¬ 
ance of 9.5 ms 2 . With low /?, the optimal queue 
length is short, so the overall latency is short, 
too. Although, since there is less time for late 
packets to catch up, the amount of packet loss 
is very high. With increasing /3, the amount 
of lost packets decreases, but the latency in¬ 
creases. Since sophisticated error concealment 
algorithms can compensate up to 2% packet 
loss [19], a constant (3 = 3 is chosen for the 
next evaluation, which is illustrated in Fig. 10. 
It demonstrates how the control algorithm han¬ 
dles various network conditions. With increas¬ 
ing network jitter variance, the system is able 
to adapt itself by using a longer queue length. 
This increases the overall latency, but not the 
packet loss so the audio quality stays constant. 







6.2 Audio 


The evaluation of the JamBerry’s audio qual¬ 
ity was performed module-wise. Therefore, the 
audio output, audio input, the headphone am¬ 
plifier and pre-amplifiers were independently in¬ 
spected. First of all, the superiority of the pro¬ 
posed audio output in contrast to the original 
PWM output shall be demonstrated. 
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Figure 11: THD of the Raspberry Pi PWM and 
the codec board output for a 1 kHz sine using 
different signal levels 

If a 1 kHz sine tone is replayed using both 
outputs accordingly and inspect the correspond¬ 
ing output spectra, as done in Fig. 12, it be¬ 
comes apparent that the quality is increased 
significantly using the new codec board. The 
PWM output introduces more distortion, visi¬ 
ble in Fig. 12 in form of larger harmonics at mul¬ 
tiples of the fundamental frequency. For exam¬ 
ple, the amplitude of the first harmonic differs 
in about 40 dB. Also the noise floor at higher 



Level 

THD 

SNR 


in dBFS 

in dB 

in dB 

Outputs 

PWM output 

0 

-57 

55 

Codec output 

0 

-81 

80 

Input 

Codec input 

0 

-91 

71 


Table 1: Digital audio hardware characteristics 



Gain 

THD 

SNR 


in dB 

in dB 

in dB 

1 Amplifiers 

Headphone 

16 

-85 

79 

Input 

17 

-81 

66 

Input 

34 

-74 

48 


Table 2: Analog audio hardware characteristics 

frequencies is significantly lower. A difference 
of up to 10 dB can be recognized in Fig. 12. At 
50 Hz ripple voltage from the power supply can 
be seen. Using a power supply of higher quality 
can reduce this disturbance. 

The distortion and noise, audio hardware in¬ 
troduces to audio signals signal is typically ex¬ 
pressed in total harmonic distortion (THD) and 
Signal-Noise-Ratio (SNR), respectively. THD 
describes, in most conventions, the ratio of the 
energy of harmonics, produced by distortion, 
and the energy of the actual signal. In contrast, 
SNR represents the ratio between the original 
signal energy and the added noise. 

The THD’s of the two outputs are illustrated 
for several signal levels in Fig. 11. Apparently, 
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Figure 12: Spectra of the Raspberry Pi PWM and the codec board output for an 1 kHz sine 


the THD of the new codec board is at least 
—20 dB lower than the original output for all 
analyzed signal levels. 

These outcomes and the corresponding mea¬ 
surement results of the other audio hardware 
modules are listed in Tab. 1 and 2. The identi¬ 
fied values should allow a high-quality capturing 
and playback of instruments. Analog amplifica¬ 
tion is always connected with the addition of 
noise. Therefore, the values of the input am¬ 
plifier decrease with an increase of gain. For 
achieving even better quality, the flexibility of 
the device allows for connection of almost any 
kind of music equipment, like favored guitar am¬ 
plifiers or vintage synthesizers. 

7 Conclusions 

The goal of this project was to create a stand¬ 
alone device, called the JamBerry, capable of 
delivering the well-known experience of a dis¬ 
tributed network performance in a user-friendly 
way. The device is based on the famous Rasp¬ 
berry Pi and is enhanced by several custom 
hardware extensions: a digital and an analog ex¬ 
tension board, providing high-quality audio in¬ 
terfaces to the Raspberry Pi, and a touchscreen 
to allow standalone operation of the device. 

The performance was evaluated under lab 
conditions and the authors assume that the sys¬ 
tem and especially the audio quality shall sat¬ 
isfy the need of most musicians. Besides the de¬ 
scribed device design proposal, the main author 
shares the ALSA kernel driver that is included 
in the Linux mainline kernel since version 3.14 
allowing the versatile connection of the Rasp¬ 
berry Pi with external audio hardware. 
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